01. Assessment
SOLUTION:
Play out an entire episode, then update V(s) for each state s encountered using returns from the remainder of the episode.SOLUTION:
Store the experience tuples in a buffer, and then randomly sample a batch for every training iteration.SOLUTION:
They can be used directly with continuous action spaces.